This assignment is for ETC5521 Assignment 1 by Team brolga comprising of Dhruv Nirmal and Gui Gao.

1 Introduction and motivation

1.1 Introduction:

The classic Board Games have been around for decades, bringing people together to enjoy the traditional game. In Greece there are many popular board game associations and ‘fan clubs’ which organise many tournaments and offer a wealth of prizes.

Today, although Computer Games are in a golden age of development with technological support, there are still many great board games that are released each year and attract a lot of attention.

Board Game Geek is a specialist board game website. Users can find every board game and information about it. This information includes descriptions of the games, reviews, user ratings, professional ratings, prices, where to buy and more.

1.2 Motivation:

My teammates and I are both interested in board games and have tried many interesting board games. This study and analysis of the huge dataset of board games can help us understand board games from a different perspective, and also help us understand the whole landscape of board games and how it has changed over time.

So, we have tried to dig deeper into the data itself to show some reports and interesting data visualisations of the results.

1.3 Data limitations:

  • The original data contains about 15-19 million reviews, and the data in the dataset should be filtered to affect the results of analyzing whether the mean scores of the games conform to a normal distribution.

  • There will be some irregularly recorded data inside the dataset, for example, there will be games with negative time years and 0 inside yearpulished varieble, which may bring limitations to our analysis.

2 Data description

  1. Source of the data

Our data comes from Kaggle by way of Board Games Geek, with a hattip to David and Georgios. We could find the data via the following website: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-25

  1. The structure of the data

We have two initial datasets, ratings and details. The initial size of the ratings dataset is around 4.8MB and contains 21631 observations and 10 columns. The original size of the details dataset is approximately 32.7MB, with 21831 observations and 23 columns.

class description
double Game ID
character Game name
double Average rating on Board Games Geek (1-10)
class dsecription
double Game ID
double Year game was published
character Game mechanic - how to play the game (separated by comma)
number Minimum number of players required to play
number Maximum number of players required to play
number Average playing time of a game
number People who own a game
number People who trade a game
number People who want to own a game
number People who wish to own a game

3 Questions of interest

3.1 Questions list

Q1. How board game ratings change by year of publication?

Q2. Trends in game mechanics over time.

Q3. Trends in board game publication rates over time.

Q4. What are the common game mechanics and their changes in prevalent?

Q5. What types of games are becoming increasingly popular?

Q6. How many games have each listed how many mechanics?

Q7. What is the distribution of ratings for the board games?

Q8. Is there a linear relationship between the year of release of a game and the average rating it receives?

Q9. Are Board Game Descriptions more positive or negative?

Q10. What are the 10 most common words used in board game descriptions?

Q11. Is there any relationship between average game time and ownership, min/max players?

Q12. How does number of people who wish to own the game and who own the game plot against each other?

Q13. How does Rank and average rating plot against each other. (Highly ranked games should have more rating)

Q14. Which age group prefers which type of game eg medical , building etc.? eg. younger children might be interested in building games etc and teenagers might be interested in games like Monopoly

Q15. Are people intrigued by a particular game designer?

Q16. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?

Q17. Is it right to assume that a game board publisher publishes a single type of board game and do the game publishers only focus on a certain age group?

Q18. Are people intrigued by a particular game publisher?

Q19. How accurate has bayes average been?

Q20. Which era or decade played a big role in people playing more board games?

3.2 Interested Questions

Q1. What are the common game mechanics and their changes in prevalent?

Q2. What is the distribution of ratings for the board games?

Q3. Is there any relationship between average playing time of a game and ownership(or products sold), min/max players required to play that game.

Q4. How did the era of electronic games(starting from 2010-11) affect the number of people who want/wish/own board games?

4 Expected findings

Q1. The most common game mechanics maybe the ‘Acting’, ‘Dice Rolling’ and ‘Hand Management’. I guess many game mechanics will become more popular.

Q2. I guess the ratings for the games follow a normal distribution.

Q3. More average play might mean less owners, as people might not want to invest too much time in a game, but it might be popular among bigger groups as board games keep a group engaged. But as the required number of players increase, the number of products sold should drop.

Q4. The demand of board games should be lower after the start of electronic games era.

5 Analysis and findings

5.1 Q1. What are the common game mechanics and their changes in prevalent?

Table 5.1: The Most Common Game Mechanics
boardgamemechanic n
Dice Rolling 6112
Hand Management 4421
Set Collection 2936
Variable Player Powers 2719
Hexagon Grid 2371
Simulation 2099
Card Drafting 1869
Tile Placement 1805
Modular Board 1697
Grid Movement 1635

5.2 Q1 Conclusion

  1. According to above table and column plot, we can find that the most common game mechanic is Dice Rolling, unsurprisingly. This is because Dice Rolling itself is a mechanic that can be used in many games. It has been around for a long time, and ancient peoples could make simple dice out of stones, clay, bones, etc. to play the game, so Dice Rolling is often seen as the most dominant symbol of board games (Sofiia & Joseph Alexander, 2017).

  2. Before I analyzed this question, I made an inference based on the actual situation of family and friends in my life – the most common board game mechanics will become more and more popular. With lollipop plot above, we can see that the top 20 most common game mechanics have become more and more popular over the past few decades. This is consistent with my previous assumptions. This is because modern board games are starting to include more mechanics, and the variety of games is becoming richer over time, so board games as a whole can also appeal to a wider audience. Another very important reason is that we now have a better standard of living and more free time, and the increase in leisure time is an obvious driver of demand for entertainment products such as board games.

5.3 Q2. What is the distribution of ratings for the board games?

Table 5.2: Boardgame Ratings ks.test
statistic p.value method alternative
0.02198876 1.6e-09 One-sample Kolmogorov-Smirnov test two-sided

5.4 Q2 Conclusion

From the above table output we can see that the test statistic is 0.021989, corresponding to a p-value of 1.647e-09. Since the p-value is less than 0.05, we reject the original hypothesis. We have sufficient evidence that the board game ratings for this sample data are not from a normal distribution.

5.5 Q3 Is there any relationship between average playing time of a game and ownership, min/max players required to play that game.

5.5.1 Data preparation

  • Selecting variables needed for the analysis from both datasets and the joining them to make the data tidy.

5.5.2 Analysis

  • Plotting total number of owners of a game against their owned game average playing time to get the if there is any correlation.

Figure 5.1: Relationship between average game time and games owned

  • Checking linear model co-efficients to confirm my observation.
## 
## Call:
## lm(formula = owned ~ playingtime, data = Q3_dataset)
## 
## Coefficients:
## (Intercept)  playingtime  
##  1490.36912     -0.02701
  • Plotting for different number of minimum players for a game to see, if a game requires more number of people, it has more playing time which in turn means less sold products.
Relationship between average game time and games owned faceted for diiferent minimum players required

Figure 5.2: Relationship between average game time and games owned faceted for diiferent minimum players required

Table 5.3: Minimum number of people required to play a game Vs games sold in that age category
minplayers sum
0 16197
1 6801485
2 20498836
3 3295642
4 631582
5 174119
6 26102
7 4258
8 29110
10 130

5.6 Q3 Conclusion

  • According to the plot in figure 5.1 one can clearly observe that if the average playing time of board games increases, the number of people who own that game decreases. My reason to assume the same was, games which require a lot of time to finish, might be a less popular option for people as a result of lack of time.

  • To verify my result, I fitted a linear model for the variables and found out the slope was negative.

  • A game with more playing time might sell less products but it can be the popular with people who play games in big groups. I expected, as the number of people required to play a game increases, the game’s selling numbers should drop. See Figure 5.2, one can observe as the required players to play a game increases, the number of products sold decreased drastically (See also 5.3 ) as smaller groups of people can more often indulge in board games.

5.7 Q4 How did the era of electronic games(starting from 2012-13) affect the number of people who want/wish/own board games?

5.7.1 Data preparation

  • Selecting variables needed for the analysis from both datasets and the joining them to make the data tidy.

5.7.2 Data Analysis

  • Plotting graph to see if along the years board games sold, or number of people wishing/wanting to own game have dropped or not.

5.8 Q4 Conclusion

  • There is now almost no market for new casual board games, and even the classics sell only a fraction of their annual sales just a few years ago. the advent of apps had a much more profound effect, absolutely devastating sales of casual board games but paradoxically increasing interest in German-style games. Many popular mobile games like Temple Run, Subway Surfer e.t.c were released around the year 2012-13 which led to the assumption that as games were more easily accessible more then than ever, the digital games era must have hurt the turnover of board game publishers. As one can observe the vertical line is at year 2013, after which there was a drastic drop in number of people who bought new board games or wish or want them. The number people owning a game peaked around year 2014 and then decreased by almost 5 times.

6 Conclusion

The top 20 most common game mechanics have become more and more popular over the past few decades. The most common game mechanic being Dice Rolling. This was achieved by doing some text analysis of the selected columns, taking the help of functions like seperate_row, gsub. Plotting histogram and performing KS-test statistic on ratings of board games, helped us came to a conclusion that the board games rating from this data set does not follow normal distribution. The average playing time of a board game and minimum number of players required to play a game is inversely proportional to the number of games sold. We took a look at the linear model coefficients to confirm our assumption and observation.
Data pivoting enabled us to rearrange the columns and rows in a report so we can view data from different perspectives. The era of mobile games affected people’s interest in board games negatively as after year 2012-13 the number of games owned/want/wish/trade dropped drastically.

7 References

C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Harmon, J. (2022). Tidy Tuesday. Retrieved from https://www.tidytuesday.com/

Lüdecke et al., (2021). performance: An R Package for Assessment, Comparison and Testing of Statistical Models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139

Robinson D (2022). drlib: Personal R package of David Robinson. R package version 0.1.1.

Robinson D, Hayes A, Couch S (2022). broom: Convert Statistical Objects into Tidy Tibbles. R package version 1.0.0, https://CRAN.R-project.org/package=broom.

tidytuesday/data/2022/2022-01-25 at master · rfordatascience/tidytuesday. (2022). Retrieved from https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-01-25

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

Wickham H, Girlich M (2022). tidyr: Tidy Messy Data. R package version 1.2.0, https://CRAN.R-project.org/package=tidyr.

Wickham H (2022). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.1, https://CRAN.R-project.org/package=stringr.

Yermolaieva S, Brown JA (2017). Dice design deserves discourse. Game & Puzzle Design, 3(2), 64-70.

Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39.

Zhu H (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4, https://CRAN.R-project.org/package=kableExtra.